For my capstone project, I selected a dataset from https://www.kaggle.com/. It includes data for different mental health disorders for countries across the globe of different years, ranging from 1990 to 2020.
unzip("F:/Chrome Downloads/archive (1).zip", exdir="Mental_Health", unzip="internal")
setwd("E:/Data Analysis using R--Google, Coursera/Mental_Health")
data <- read_csv("mental-health-disorders.csv")
## Rows: 6840 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Country, Code
## dbl (8): Year, Schizophrenia, Bipolar_disorder, Eating_disorders, Anxiety_di...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 10
## Country Code Year Schizophrenia Bipolar_disorder Eating_disorders
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1990 0.229 0.721 0.131
## 2 Afghanistan AFG 1991 0.228 0.720 0.126
## 3 Afghanistan AFG 1992 0.227 0.718 0.122
## 4 Afghanistan AFG 1993 0.226 0.717 0.118
## 5 Afghanistan AFG 1994 0.226 0.717 0.115
## 6 Afghanistan AFG 1995 0.225 0.717 0.111
## # ℹ 4 more variables: Anxiety_disorders <dbl>, Drug_use_disorders <dbl>,
## # Depressive_disorders <dbl>, Alcohol_use_disorders <dbl>
For my first visualization, I tried to explain the relationship between different mental health issues. A strong correlation was detected b/w Eating disorders and Anxiety as well as Schizophrenia and Eating disorders. For Bipolar there was a correlation with Eating disorders as well as Anxiety.
For this visual, I selected alcohol usage and drug usage disorders for the year 2018 across th globe and checked their relationship using linear model. There turned out to be a weak linear relationship meaning that patients who suffer from alcohol usage may also develop drug abuse and related disorders. I presented the data using interactive plotly widget, so that the audience can interact and evaulate data.
## `geom_smooth()` using formula = 'y ~ x'
Here, I aim to visualize which mental health issue ranks on top across the globe. According to the available data Anxiety stood at first place in all years. Depression came in second place. I presented the data in the form of a bar graph. The data also revealed that eating disorders are not as prevalent.
Now, I evaluated schizophrenia prevalence in three different continents by sub setting the data and filtering for the latest year. I also sampled data for 25 countries. The animated dumbbell graph shows that schizophrenia is present mostly is North America with USA taking the lead, Asia shortly followed it and finally Europe.
I plotted a line graph to show the trend of depression in South Asian Countries. For countries like Afghanistan, Pakistan the trend remained constant.But for Maldives and Sri lanka it shows a decline.There were some countries like Bhutan and Nepal where there was a certain increase in trend then a sudden decline.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
As anxiety stood at top most position, I tried to visualize it over time for three different continents using an interactive point plot which shows the gradual increase or decrease in it.
Eating disorders rarely prevailed in Asia according to the box plot which shows a smaller distribution of data. For North America the data was distributed at a larger scale but the mean was of smaller value. Lastly, eating disorders were most prevalent in Europe with a lot of outliers meaning that for some countries in Europe there is not a huge percentage of eating disorders.
For my final visual, I plotted geospatial data by joining my dataset
with the already available world data in R library maps as my dataset
lacked such information. I tried to visualize in which countries drug
usage is on rise, hence increasing drug usage disroders.
The dataset can be subsetted into further continents to better visualize mental health disorders. More compelling graphs can increase the awareness of mental health issues across the globe. A shiny or flexdashboard can allow more user-friendly interaction with data. For this project this was it, I hope I have properly conveyed my message and increased some awareness about this topic, due to the time limitations and scope of the project I was unable to create more figures.